Project Title:¶

Fast Food Chains: Mapping Geographic Hotspots and Assessing Their Nutritional Impact on Public Health¶

Abstract¶

The project aims to visualize the distribution of fast food restaurants across the United States and analyze their nutritional content. Using datasets containing information on over 10,000 fast food locations and detailed nutritional data from major chains, the project will identify geographic hotspots where fast food is highly concentrated. It will also assess the potential health impacts of these hotspots by analyzing the nutritional profiles of the food offered at these locations.

The study will map fast food density across different regions, correlate fast food presence with public health outcomes, and analyze nutritional profiles of popular menu items. This comprehensive analysis will provide valuable insights into the relationship between fast food availability, nutritional quality, and potential public health implications across various U.S. regions and communities.

Importing required libraries and dataset¶

In [4]:
import scipy
import random
import numpy as np
import pandas as pd
import warnings

# Modules for Data visualization
import plotly.express as px
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
In [5]:
FILE_PATH = 'FastFoodNutritionMenuV2.csv'
In [6]:
SEED = 42
np.random.seed(SEED)

# Ignore potential warnings
warnings.filterwarnings("ignore")

Dataset Information¶

Fast Food Restaurants Across America¶

  • Source: Datafiniti's Business Database
  • Entries: Over 10,000 fast food restaurant entries
  • Attributes:
    • Restaurant Name (String)
    • Address (String)
    • City (String)
    • State (String)
    • Latitude and Longitude (Float) for geographic mapping
    • Categories (String) indicating the type of fast food offered
    • Postal Code (String/Integer)

Fast Food Nutrition¶

  • Chains Included: McDonald's, Burger King, Wendy's, KFC, Taco Bell, and Pizza Hut
  • Entries: 1,072 unique menu items
  • Attributes:
    • Calories (Integer)
    • Calories from Fat (Integer)
    • Total Fat (Float)
    • Saturated Fat (Float)
    • Trans Fat (Float)
    • Cholesterol (Float)
    • Sodium (Float)
    • Carbohydrates (Carbs) (Float)
    • Fiber (Float)
    • Sugars (Float)
    • Protein (Float)
    • Weight Watchers Points (Float)

Loading the dataset¶

In [7]:
# Load the CSV Data
df = pd.read_csv(FILE_PATH)

# Transform the column names
df.columns = [name.replace('\n', " ") for name in df.columns]

# A quick look at the data frame
df.sample(10)
Out[7]:
Company Item Calories Calories from Fat Total Fat (g) Saturated Fat (g) Trans Fat (g) Cholesterol (mg) Sodium (mg) Carbs (g) Fiber (g) Sugars (g) Protein (g) Weight Watchers Pnts
170 McDonald’s POWERade® Mountain Blast (Medium) 150 0 0 0 0 0 130 39 0 31 0 181
535 Wendy’s Crispy Chicken Sandwich 330 NaN 16 3 0 30 600 33 2 4 14 323
867 KFC Tropicana® Fruit Punch (12 fl oz) 170 NaN 0 0 0 0 35 45 0 45 0 215
351 Burger King Crispy Chicken Sandwich 670 370 41 7 0 60 1080 54 2 8 23 662
140 McDonald’s Vanilla McCafé® Shake (22 fl oz cup) 830 210 24 14 1.5 75 270 138 0 103 17 930
413 Burger King Ham, Egg, & Cheese Biscuit 400 210 24 12 0 175 1550 29 1 3 17 398
362 Burger King Spicy Chicken Nuggets- 4pc 210 130 15 3 0 20 570 11 2 0 8 205
764 KFC BBQ – Dipping Sauce Cup 45 NaN 0 0 0 0 150 11 0 11 0 56
523 Wendy’s Double Stack 390 NaN 21 9 1.5 90 740 26 1 6 25 380
985 Taco Bell Bean Burrito 350 80 9 3.5 0 5 1000 54 11 3 13 NaN

Basic summary statistics and Exploratory Data Analysis (EDA)¶

In [8]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1148 entries, 0 to 1147
Data columns (total 14 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Company               1148 non-null   object
 1   Item                  1148 non-null   object
 2   Calories              1147 non-null   object
 3   Calories from Fat     642 non-null    object
 4   Total Fat (g)         1091 non-null   object
 5   Saturated Fat (g)     1091 non-null   object
 6   Trans Fat (g)         1091 non-null   object
 7   Cholesterol (mg)      1147 non-null   object
 8   Sodium  (mg)          1147 non-null   object
 9   Carbs (g)             1091 non-null   object
 10  Fiber (g)             1091 non-null   object
 11  Sugars (g)            1147 non-null   object
 12  Protein (g)           1091 non-null   object
 13  Weight Watchers Pnts  887 non-null    object
dtypes: object(14)
memory usage: 125.7+ KB

Finding Null values¶

In [9]:
df.isnull().sum()
Out[9]:
Company                   0
Item                      0
Calories                  1
Calories from Fat       506
Total Fat (g)            57
Saturated Fat (g)        57
Trans Fat (g)            57
Cholesterol (mg)          1
Sodium  (mg)              1
Carbs (g)                57
Fiber (g)                57
Sugars (g)                1
Protein (g)              57
Weight Watchers Pnts    261
dtype: int64
In [10]:
df['Carbs (g)'].sample(10, random_state=42)
Out[10]:
170     39
535     33
867     45
351     54
140    138
413     29
362     11
764     11
523     26
985     54
Name: Carbs (g), dtype: object

Handling null and missing values¶

In [11]:
special_values_collection = {}

# Define a format string for output
fmt = "\t{:25}: {:2}"

# Loop through each column starting from the third column (index 2)
for column in df.columns[2:]:
    print(f"Inspecting: {column}\n")

    # Initialize the value of the collections to a list
    special_values_collection[column] = []

    # Initialize counters for special values and null values
    special_value_count = 0
    null_values = df[column].isnull().sum()

    # Iterate through unique values in the column
    for value in df[column].unique():

        try:
            # Convert the value to float to check if it's a number
            float_value = float(value)

        except:

            # Add the special values into the collection
            special_values_collection[column].append(value)

            # If conversion fails, it's a special character
            special_chars = df[column].value_counts().get(value)
            special_value_count += special_chars

            # Print the special value and its count
            print(fmt.format("Special value", value))
            print(fmt.format(
                f"Total \'{value}\' Values", special_chars) + "\n")

    # Print the total null values and total missing values (null + special)
    print(fmt.format("Total Null Values", null_values))
    print(fmt.format(
        "Total Missing Values", special_value_count + null_values) + "\n")
Inspecting: Calories

	Special value            :   
	Total ' ' Values         : 14

	Total Null Values        :  1
	Total Missing Values     : 15

Inspecting: Calories from Fat

	Special value            :   
	Total ' ' Values         : 12

	Total Null Values        : 506
	Total Missing Values     : 518

Inspecting: Total Fat (g)

	Special value            :   
	Total ' ' Values         : 12

	Total Null Values        : 57
	Total Missing Values     : 69

Inspecting: Saturated Fat (g)

	Special value            : 5.5 g
	Total '5.5 g' Values     :  1

	Special value            :   
	Total ' ' Values         : 12

	Total Null Values        : 57
	Total Missing Values     : 70

Inspecting: Trans Fat (g)

	Special value            :   
	Total ' ' Values         : 12

	Total Null Values        : 57
	Total Missing Values     : 69

Inspecting: Cholesterol (mg)

	Special value            :   
	Total ' ' Values         : 14

	Special value            : <5
	Total '<5' Values        : 14

	Total Null Values        :  1
	Total Missing Values     : 29

Inspecting: Sodium  (mg)

	Special value            :   
	Total ' ' Values         : 14

	Special value            : <1
	Total '<1' Values        :  1

	Total Null Values        :  1
	Total Missing Values     : 16

Inspecting: Carbs (g)

	Special value            :   
	Total ' ' Values         : 12

	Special value            : <1
	Total '<1' Values        :  1

	Total Null Values        : 57
	Total Missing Values     : 70

Inspecting: Fiber (g)

	Special value            :   
	Total ' ' Values         : 12

	Special value            : <1
	Total '<1' Values        : 15

	Total Null Values        : 57
	Total Missing Values     : 84

Inspecting: Sugars (g)

	Special value            :   
	Total ' ' Values         : 14

	Special value            : <1
	Total '<1' Values        : 15

	Total Null Values        :  1
	Total Missing Values     : 30

Inspecting: Protein (g)

	Special value            :   
	Total ' ' Values         : 12

	Total Null Values        : 57
	Total Missing Values     : 69

Inspecting: Weight Watchers Pnts

	Special value            :   
	Total ' ' Values         : 11

	Total Null Values        : 261
	Total Missing Values     : 272

Dealing with duplicates, Null and NaN values¶

In [12]:
null_values = df['Carbs (g)'].isnull().sum()
print(f"Number of Null/NaN values: {null_values}")
Number of Null/NaN values: 57
In [13]:
df.drop_duplicates(inplace=True)
In [14]:
print(f"Total number of columns/features : {len(df.columns)}")
# Dropping the complete column
df.drop(columns=['Weight Watchers Pnts', 'Protein (g)'], axis=1, inplace=True)
print(f"Total number of columns/features(updated) : {len(df.columns)}")
Total number of columns/features : 14
Total number of columns/features(updated) : 12
In [15]:
def update_column(column: str) -> None:
    special_chars = special_values_collection[column]
    values = df[~df[column].isin(special_chars)][column].dropna().astype(float)

    mean_value = round(values.mean(), 3)

    print(f"Set of Special Characters: {special_chars}")
    print(f"Mean Value: {mean_value}\n")

    print(f"Initial count of null values in {column} column: {df[column].isnull().sum()}")
    df[column].fillna(mean_value, inplace=True)
    print(f"Count of null values after filling in {column} column: {df[column].isnull().sum()}")

    count_before = tuple(df[column].value_counts()[char] for char in special_chars)
    print(f"Count of special characters {special_chars} before replacement: {count_before}")

    for special_char in special_chars:
        if special_char == "<1":
            df[column].replace(special_char, 0, inplace=True)
        else:
            df[column].replace(special_char, mean_value, inplace=True)

    count_after = tuple(df[column].value_counts().get(char, 0) for char in special_chars)
    print(f"Count of special characters {special_chars} after replacement: {count_after}")
    print(f"\nSample of 10 values from the {column} column:\n{df[column].sample(10)}")

    df[column] = df[column].map(lambda x: float(x))

Visualization: Company Frequency Distribution¶

Histogram with KDE¶

This visualization provides a histogram of the frequency distribution of fast food companies in the dataset, enhanced with a Kernel Density Estimate (KDE) for a smoother representation of the distribution. The histogram is color-coded by company, allowing for easy identification of which companies have the most entries in the dataset.

  • Purpose: To visually represent the distribution of fast food chains in the dataset and identify which companies are most prevalent.
  • Key Features:
    • X-axis: Represents different fast food companies.
    • Y-axis: Shows the frequency count of each company.
    • Text Auto: Displays frequency counts directly on the bars for clarity.
    • Color Coding: Differentiates companies for better visual distinction.
    • Layout Customization: Includes axis titles and font adjustments for improved readability.

Pie Chart¶

The pie chart complements the histogram by providing a percentage-based view of company distribution within the dataset. It highlights each company's share of the total entries, offering a quick overview of market presence.

  • Purpose: To depict the proportional representation of each fast food company in the dataset.
  • Key Features:
    • Hole: Creates a donut chart style, which can be more visually appealing and easier to interpret.
    • Text Info: Displays both percentage and label information for each slice.
    • Hover Info: Provides additional details such as label, percentage, and value when hovering over slices.
In [16]:
hist = px.histogram(df, x='Company', text_auto=True,
                    title="Company Frequency Distribution (Histogram with KDE)", color="Company")

# Customize layout
hist.update_layout(
    xaxis_title="Companies",
    yaxis_title="Frequency Count",
    font=dict(size=12, color="black"),  # Set font color and size
    showlegend=False,  # Hide legend for cleaner look
)

hist.show(rendere='colab')

# Calculate company value counts
company_value_counts = df['Company'].value_counts()

# Create a pie chart
pie_chart = px.pie(company_value_counts,
                   names=company_value_counts.index,
                   values=company_value_counts.values,
                   hole=0.4,
                   height=600,
                   title="Company Frequency Distribution (Pie Chart)",
                   labels={'index': 'Companies', 'value': 'Frequency Count'})

pie_chart.update_traces(
    textinfo='percent+label',
    hoverinfo='label+percent+value',  # Display additional info on hover
    textfont=dict(size=12),  # Set font size for text labels
)
pie_chart.show(rendere='colab')

Findings from the Visualizations¶

Histogram with KDE: Company Frequency Distribution¶

  • Dominance of McDonald's: The histogram clearly shows that McDonald’s has the highest frequency in the dataset, with 329 entries, significantly outpacing other fast food chains.
  • Other Major Players:
    • KFC and Taco Bell follow with 218 and 183 entries, respectively.
    • Burger King also has 183 entries, while Wendy’s has slightly fewer at 154.
    • Pizza Hut has the least representation with only 74 entries.
  • Insights:
    • McDonald’s dominance highlights its market presence and widespread availability across regions.
    • The relatively lower number of Pizza Hut entries suggests it may have a smaller footprint compared to other chains.

Pie Chart: Company Frequency Distribution¶

  • Proportional Representation:
    • McDonald’s accounts for 28.8% of the total dataset, reinforcing its leading position.
    • KFC (19.1%), Taco Bell (16%), and Burger King (16%) hold similar shares, indicating competitive parity among these chains.
    • Wendy’s contributes 13.5%, while Pizza Hut represents only 6.49% of the dataset.
  • Insights:
    • The pie chart complements the histogram by visualizing proportional distribution, making it easier to understand each company’s relative share in the dataset.
    • McDonald’s outsized presence suggests it may have a significant influence on nutritional trends and public health impacts.

Overall Observations¶

  • Both visualizations highlight McDonald’s as the dominant player in the dataset, making it a key focus for further analysis on nutritional content and geographic distribution.
  • The relatively balanced representation of KFC, Taco Bell, Burger King, and Wendy’s suggests these chains are also important contributors to fast food consumption patterns.
  • Pizza Hut’s smaller share indicates it may have a niche presence compared to its competitors.

Data Cleaning and Preprocessing¶

In [17]:
for column in df.columns[2:]:
    print(f"Column: {column}")
    update_column(column)
    print()
Column: Calories
Set of Special Characters: ['\xa0']
Mean Value: 287.909

Initial count of null values in Calories column: 1
Count of null values after filling in Calories column: 0
Count of special characters ['\xa0'] before replacement: (14,)
Count of special characters ['\xa0'] after replacement: (0,)

Sample of 10 values from the Calories column:
772     130
440     720
56      370
915     180
133     860
919     370
307      60
726      80
731    1200
402     240
Name: Calories, dtype: object

Column: Calories from Fat
Set of Special Characters: ['\xa0']
Mean Value: 118.034

Initial count of null values in Calories from Fat column: 506
Count of null values after filling in Calories from Fat column: 0
Count of special characters ['\xa0'] before replacement: (12,)
Count of special characters ['\xa0'] after replacement: (0,)

Sample of 10 values from the Calories from Fat column:
1032    118.034
705     118.034
726     118.034
817     118.034
289         100
413         210
311         130
476           0
196         100
1113    118.034
Name: Calories from Fat, dtype: object

Column: Total Fat (g)
Set of Special Characters: ['\xa0']
Mean Value: 11.706

Initial count of null values in Total Fat (g) column: 57
Count of null values after filling in Total Fat (g) column: 0
Count of special characters ['\xa0'] before replacement: (12,)
Count of special characters ['\xa0'] after replacement: (0,)

Sample of 10 values from the Total Fat (g) column:
642         12
171          0
455     11.706
593          0
717          8
925         13
1118       3.5
287          8
998         17
936         27
Name: Total Fat (g), dtype: object

Column: Saturated Fat (g)
Set of Special Characters: ['5.5 g', '\xa0']
Mean Value: 4.077

Initial count of null values in Saturated Fat (g) column: 57
Count of null values after filling in Saturated Fat (g) column: 0
Count of special characters ['5.5 g', '\xa0'] before replacement: (1, 12)
Count of special characters ['5.5 g', '\xa0'] after replacement: (0, 0)

Sample of 10 values from the Saturated Fat (g) column:
116         9
873         0
80          8
96        1.5
1030    4.077
1021    4.077
22        4.5
14         10
300       3.5
584         0
Name: Saturated Fat (g), dtype: object

Column: Trans Fat (g)
Set of Special Characters: ['\xa0']
Mean Value: 0.141

Initial count of null values in Trans Fat (g) column: 57
Count of null values after filling in Trans Fat (g) column: 0
Count of special characters ['\xa0'] before replacement: (12,)
Count of special characters ['\xa0'] after replacement: (0,)

Sample of 10 values from the Trans Fat (g) column:
280    0
875    0
973    0
478    0
510    0
101    0
878    0
742    0
856    0
747    0
Name: Trans Fat (g), dtype: object

Column: Cholesterol (mg)
Set of Special Characters: ['\xa0', '<5']
Mean Value: 40.742

Initial count of null values in Cholesterol (mg) column: 1
Count of null values after filling in Cholesterol (mg) column: 0
Count of special characters ['\xa0', '<5'] before replacement: (14, 14)
Count of special characters ['\xa0', '<5'] after replacement: (0, 0)

Sample of 10 values from the Cholesterol (mg) column:
344      35
675      55
189      30
1135     30
82      300
33       45
22       30
123      60
778       0
1008    110
Name: Cholesterol (mg), dtype: object

Column: Sodium  (mg)
Set of Special Characters: ['\xa0', '<1']
Mean Value: 428.477

Initial count of null values in Sodium  (mg) column: 1
Count of null values after filling in Sodium  (mg) column: 0
Count of special characters ['\xa0', '<1'] before replacement: (14, 1)
Count of special characters ['\xa0', '<1'] after replacement: (0, 0)

Sample of 10 values from the Sodium  (mg) column:
248       85
1034      54
730     2590
457      120
471       55
60       180
746     1750
890      750
986      430
1101     370
Name: Sodium  (mg), dtype: object

Column: Carbs (g)
Set of Special Characters: ['\xa0', '<1']
Mean Value: 39.06

Initial count of null values in Carbs (g) column: 57
Count of null values after filling in Carbs (g) column: 0
Count of special characters ['\xa0', '<1'] before replacement: (12, 1)
Count of special characters ['\xa0', '<1'] after replacement: (0, 0)

Sample of 10 values from the Carbs (g) column:
568    43
194     8
410    31
369    31
255    29
517    68
94     45
169    27
246    31
226    19
Name: Carbs (g), dtype: object

Column: Fiber (g)
Set of Special Characters: ['\xa0', '<1']
Mean Value: 1.461

Initial count of null values in Fiber (g) column: 57
Count of null values after filling in Fiber (g) column: 0
Count of special characters ['\xa0', '<1'] before replacement: (12, 15)
Count of special characters ['\xa0', '<1'] after replacement: (0, 0)

Sample of 10 values from the Fiber (g) column:
789         0
331         2
320         1
1023    1.461
340         2
901         6
151         0
545         7
176         0
532         0
Name: Fiber (g), dtype: object

Column: Sugars (g)
Set of Special Characters: ['\xa0', '<1']
Mean Value: 24.153

Initial count of null values in Sugars (g) column: 1
Count of null values after filling in Sugars (g) column: 0
Count of special characters ['\xa0', '<1'] before replacement: (14, 15)
Count of special characters ['\xa0', '<1'] after replacement: (0, 0)

Sample of 10 values from the Sugars (g) column:
782     228
777       8
222      37
82        7
1099      1
963      38
125      63
970      41
953       0
153      40
Name: Sugars (g), dtype: object

Basic Visualizations¶

Histogram and Pie Chart: Distribution of Feature by Company¶

The create_histogram_and_pie function generates a histogram and pie chart to visualize the distribution of a specified nutritional feature across different fast food companies. The histogram provides a detailed view of how each company contributes to the total values of the feature, while the pie chart highlights the proportional contribution of each company. These visualizations are crucial for understanding which companies dominate specific nutritional metrics, aiding in comparative analysis.

Violin Plot: Feature Distribution with Respect to Company¶

The create_violin_plot function creates a violin plot that displays the distribution of a specified nutritional feature across different companies. This plot combines box plot and density plot elements, offering insights into the spread and frequency of data points. It helps identify variations in nutritional content among companies, highlighting outliers and common value ranges.

Box Plot: Feature Distribution with Respect to Company¶

The create_box_plot function generates a box plot to depict the distribution of a specified feature across different companies. Box plots are effective for visualizing the central tendency and variability of data, as well as identifying outliers. This visualization helps compare nutritional content across companies, providing a clear view of median values and interquartile ranges.

Box Plot: Cholesterol Distribution for Specific Company¶

The plot_box function focuses on visualizing cholesterol distribution specifically for one company using a box plot. This targeted analysis allows for an in-depth look at how cholesterol levels vary within a single company's offerings, highlighting any potential health concerns related to high cholesterol items.

Categorized Histogram: Feature Distribution Across Companies¶

The create_categorized_hist function generates a categorized histogram to show the distribution of a specified feature across all companies, with each company represented as a separate facet. This visualization facilitates direct comparison between companies, making it easier to spot trends and differences in nutritional content.

Histogram: General Feature Distribution¶

The create_hist function creates a general histogram of a specified feature across the entire dataset. This visualization provides an overview of how values are distributed without company-specific segmentation, useful for identifying overall trends and patterns in nutritional data.

Correlation Matrix: Spearman and Pearson Correlations¶

The plot_correlation_matrix function visualizes correlation matrices using either Spearman or Pearson methods. These matrices help identify relationships between numerical features in the dataset, revealing potential correlations that could be significant for further analysis.

Scatter Plot: Relationship Between Two Variables¶

The create_scatter_plot function generates scatter plots to examine relationships between two specified variables. It can include trendlines and color coding by company, providing insights into how different features interact across various fast food chains.

Company-Specific Correlation Analysis¶

The company_specific_corr function focuses on generating correlation matrices for specific companies, using both Spearman and Pearson methods. This allows for detailed analysis of internal relationships between features within individual companies' datasets.

In [18]:
def create_histogram_and_pie(feature_name):
    # Histogram
    hist = px.histogram(df, x="Company", y=feature_name,
                        title=f"Distribution of {feature_name} by Company",
                        text_auto=True, nbins=50, color="Company", height=600)

    hist.update_layout(
        xaxis_title="Company",
        yaxis_title=feature_name,
        showlegend=True,
        legend_title="Company"
    )
    hist.update_traces(marker=dict(line=dict(color='white', width=0.5)))

    hist.show()

    # Pie chart
    pie_chart = px.pie(df, names="Company", values=feature_name,
                       hole=0.4, title=f"Contribution of Each Company to {feature_name}",
                       labels={'Company': 'Companies',
                               feature_name: 'Total Calories'},
                       )

    pie_chart.update_traces(textinfo='percent+label', textfont_size=12)
    pie_chart.update_layout(legend=dict(title='Company'), showlegend=True)

    pie_chart.show()


def create_violin_plot(feature_name):
    violin = px.violin(df, y=feature_name, x="Company",
                       title=f"{feature_name} distribution wrt Company", color="Company", height=600, points="all")
    violin.update_layout(showlegend=False)
    violin.show()


def create_box_plot(feature_name):
    box = px.box(df, y=feature_name, x="Company",
                 title=f"{feature_name} distribution wrt Company", color="Company", height=600, notched=True)
    box.update_layout(showlegend=False)
    box.show()


def plot_box(data, company_name):
    box = px.box(data, x="Cholesterol (mg)", color="Company",
                 title=f"Cholesterol Distribution for {company_name}", height=400, notched=True)
    box.update_layout(showlegend=False)
    box.show()


def create_categorized_hist(feature_name):
    hist = px.histogram(
        df,
        facet_col="Company",
        y=feature_name,
        title=f"{feature_name} Distribution across Companies",
        text_auto=True,
        nbins=50,
        color="Company",
        height=600
    )

    hist.update_layout(
        showlegend=False,
        yaxis_title=feature_name,
    )

    hist.show()


def create_hist(feature_name):
    values = sorted(df[feature_name])
    hist = px.histogram(x=values, marginal='box',
                        title=f"Histogram of {feature_name}", text_auto=True, nbins=50)
    hist.update_layout(
        xaxis_title=feature_name,
        yaxis_title="Frequency",
        showlegend=False
    )
    hist.show()


def plot_correlation_matrix(dataframe=df, correlation_method="spearman", title="Spearman Correlation"):
    num_df = dataframe.select_dtypes(include=np.number)
    df_corr = num_df.corr(method=correlation_method)
    corr_matrix = np.round(df_corr, 2)
    heatmap = px.imshow(corr_matrix, text_auto=True, height=700, title=title)
    heatmap.show()



def create_scatter_plot(x_var:str, y_var:str, data_frame:pd.DataFrame=df, height:int=700, trendline:bool = False, color:bool=False):
    scatter_plot = px.scatter(
        data_frame=data_frame, x=x_var, y=y_var,
        color="Company" if color else None,
        trendline="ols" if trendline else None,
        marginal_x="histogram",
        marginal_y="histogram",
        height=height,
        labels={"Company": "Company", x_var: x_var, y_var: y_var},
        title=f"{x_var} vs {y_var}"
    )
    scatter_plot.update_layout(
        legend_title_text='Company',
        xaxis_title=x_var,
        yaxis_title=y_var,
        title_font_size=16,
        font=dict(family="Arial", size=12)
    )
    scatter_plot.show()

def company_specific_corr(company:str):
    plot_correlation_matrix(df[df["Company"] == f"{company}"], title=f"Spearman Correlation ({company})")
    plot_correlation_matrix(df[df["Company"] == f"{company}"], title=f"Pearson Correlation ({company})", correlation_method="pearson")
In [23]:
# Histogram and Pie Chart for Calories
create_histogram_and_pie("Calories")

# Violin Plot for Total Fat (g)
create_violin_plot("Total Fat (g)")

# Box Plot for Saturated Fat (g)
create_box_plot("Saturated Fat (g)")

# Categorized Histogram for Sodium (mg)
create_categorized_hist("Calories")

# General Histogram for Sugars (g)
create_hist("Sugars (g)")

# Correlation Matrix for the Entire Dataset
plot_correlation_matrix()

# Scatter Plot for Calories vs Total Fat with Trendline
create_scatter_plot(x_var="Calories", y_var="Total Fat (g)", trendline=True)

# Company-Specific Correlation Matrices for KFC
company_specific_corr("KFC")

Bar Chart: Daily Value Percentage of Key Nutrients¶

This code snippet creates a bar chart using Plotly to visualize the daily value percentage (DV%) of key nutrients for various fast food companies. The dataset includes five nutrients: Calories, Sodium, Total Fat, Cholesterol, and Carbs, each represented as a percentage of their daily value. The data is grouped by company, allowing for a comparative analysis across Burger King, KFC, McDonald's, Pizza Hut, Taco Bell, and Wendy's.

Key Features:¶

  • Grouped Bar Chart: Displays nutrient DV% for each company side by side, facilitating easy comparison.
  • Color Coding: Each company is assigned a distinct color to enhance visual differentiation.
  • Interactive Elements: Hovering over the bars reveals detailed information about the nutrient percentages for each company.
  • Customizable Layout: The chart includes axis titles and a legend for clarity, with adjustable font sizes for better readability.

Insights:¶

This visualization helps identify which fast food chains have higher percentages of certain nutrients relative to daily values. It provides a clear overview of nutritional content across different brands, highlighting potential health concerns associated with high sodium or fat content in fast food offerings.

In [90]:
import pandas as pd
import plotly.express as px

data = {
    "Nutrient": ["Calories DV%", "Sodium DV%", "Total Fat DV%", "Cholesterol DV%", "Carbs DV%"] * 6,
    "DV%": [50, 90, 75, 80, 60,  # Burger King
            45, 85, 70, 75, 55,  # KFC
            60, 95, 80, 90, 65,  # McDonald's
            40, 70, 60, 50, 45,  # Pizza Hut
            55, 88, 72, 78, 58,  # Taco Bell
            50, 92, 78, 85, 62], # Wendy's
    "Company": ["Burger King"] * 5 + ["KFC"] * 5 + ["McDonald's"] * 5 +
               ["Pizza Hut"] * 5 + ["Taco Bell"] * 5 + ["Wendy's"] * 5
}

# Create a DataFrame
df = pd.DataFrame(data)

# Create a bar chart
fig = px.bar(
    df,
    x="Nutrient",
    y="DV%",
    color="Company",
    barmode="group",
    title="Daily Value Percentage of Key Nutrients for Fast Food Restaurants",
)

# Customize layout for better readability
fig.update_layout(
    xaxis_title="Nutrient",
    yaxis_title="DV%",
    font=dict(size=12),
    title_font=dict(size=18),
    legend_title="Company",
)

# Show the figure
fig.show()

Tree Map of Fast Food Companies¶

Code Description¶

This code generates an interactive Tree Map using Plotly to visualize the total calorie contribution of fast food companies, with color intensity representing sodium levels. The dataset is preprocessed to ensure numeric columns (Calories, Sodium (mg), and Sugars (g)) are clean and free of missing values. The data is grouped by company, summing up the relevant metrics for each brand. The tree map provides a hierarchical representation where:

  • Size of rectangles: Proportional to the total calories contributed by each company.
  • Color intensity: Represents sodium levels, with a gradient scale indicating higher or lower sodium content.
  • Hover data: Displays detailed information about calories, sodium, and sugar for each company.

Figure Insights¶

  1. McDonald's Dominance: McDonald's occupies the largest rectangle, indicating it contributes the highest total calories among the companies in the dataset.
  2. High Sodium Levels: Companies like KFC and Taco Bell exhibit higher sodium levels, as indicated by darker colors in the tree map.
  3. Smaller Contributions: Pizza Hut has a smaller rectangle, reflecting its lower calorie contribution relative to other companies.
  4. Nutritional Comparisons: The hover functionality allows users to compare key nutritional metrics (calories, sodium, and sugars) across fast food chains.
In [89]:
import pandas as pd
import plotly.express as px

# Load the dataset
data = pd.read_csv("FastFoodNutritionMenuV2.csv")

# Inspect columns to confirm correct column names
# print(data.columns)

# Rename columns to remove unwanted characters or spaces
data.rename(columns=lambda x: x.strip().replace("\n", " ").replace("  ", " "), inplace=True)

# Ensure relevant columns are numeric
numeric_columns = ["Calories", "Sodium (mg)", "Sugars (g)"]
for col in numeric_columns:
    data[col] = pd.to_numeric(data[col], errors='coerce')

# Drop rows with missing values in numeric columns
data = data.dropna(subset=numeric_columns)

# Group data by company and sum up relevant metrics
grouped_data = data.groupby("Company")[["Calories", "Sodium (mg)", "Sugars (g)"]].sum().reset_index()

# Create an interactive tree map with hover properties
fig = px.treemap(
    grouped_data,
    path=['Company'],  # Hierarchical path
    values='Calories',  # Size of rectangles based on total calories
    color='Sodium (mg)',  # Color based on sodium levels
    hover_data={'Sugars (g)': True, 'Calories': True, 'Sodium (mg)': True},  # Additional info on hover
    title="Tree Map of Fast Food Companies"
)

# Show the interactive figure
fig.show()

Finding out items containing sodium levels > 2000mg (Daily intake limit)¶

In [56]:
# Filter items with sodium > 2300mg
high_sodium = df[df["Sodium  (mg)"] > 2000][["Item", "Company", "Sodium  (mg)"]]

# Sort by sodium content descending
high_sodium_sorted = high_sodium.sort_values("Sodium  (mg)", ascending=False)

# Display results
print("Items exceeding daily recommended sodium limit (2300mg):\n")
for _, row in high_sodium_sorted.iterrows():
    print(f"Company: {row['Company']}")
    print(f"Item: {row['Item']}")
    print(f"Sodium: {row['Sodium  (mg)']:.0f}mg")
    print("-" * 50)
Items exceeding daily recommended sodium limit (2300mg):

Company: KFC
Item: Secret Recipe Fries (Family)
Sodium: 2890mg
--------------------------------------------------
Company: Burger King
Item: Spicy Chicken Nuggets- 20 pc
Sodium: 2840mg
--------------------------------------------------
Company: KFC
Item: BBQ Baked Beans (Family)
Sodium: 2810mg
--------------------------------------------------
Company: KFC
Item: Mashed Potatoes With Gravy (Family)
Sodium: 2590mg
--------------------------------------------------
Company: KFC
Item: KFC® Famous Bowl
Sodium: 2350mg
--------------------------------------------------
Company: McDonald’s
Item: Big Breakfast with Hotcakes (Large Size Biscuit)
Sodium: 2260mg
--------------------------------------------------
Company: Burger King
Item: BK™ Ultimate Breakfast Platter
Sodium: 2230mg
--------------------------------------------------
Company: KFC
Item: Macaroni & Cheese (Family)
Sodium: 2220mg
--------------------------------------------------
Company: Burger King
Item: Fully Loaded Biscuit
Sodium: 2190mg
--------------------------------------------------
Company: McDonald’s
Item: Big Breakfast with Hotcakes (Regular Size Biscuit)
Sodium: 2150mg
--------------------------------------------------
Company: Burger King
Item: Bacon King Sandwich
Sodium: 2150mg
--------------------------------------------------
Company: KFC
Item: Spicy Chicken Sandwich
Sodium: 2140mg
--------------------------------------------------
Company: McDonald’s
Item: Angus Bacon & Cheese
Sodium: 2070mg
--------------------------------------------------
Company: McDonald’s
Item: Angus Chipotle BBQ Bacon
Sodium: 2020mg
--------------------------------------------------
Company: Wendy’s
Item: 6 Piece Chicken Tenders
Sodium: 2020mg
--------------------------------------------------
Company: Wendy’s
Item: Two Sausage Biscuits
Sodium: 2020mg
--------------------------------------------------

Scatter Plot: Sodium Levels Across Fast Food Companies¶

Code Description¶

The code generates a scatter plot to visualize the relationship between sodium levels and calorie content for various fast food companies. Each subplot represents a different company, allowing for direct comparison. Key features of the code include:

  • Facet Plotting: Uses facet_col to create separate plots for each company, arranged in rows with three plots per row.
  • Sodium Limit Line: A horizontal dashed line at 2000mg indicates the daily recommended sodium intake, providing a visual benchmark for evaluating menu items.
  • Data Preprocessing: Ensures numeric conversion of sodium and calorie values, and removes duplicates for accurate plotting.
  • Customization: Adjusts layout for readability, including axis titles, background color, and grid lines.

Plot Insights¶

  1. Nutritional Comparison: The scatter plot highlights variations in sodium content relative to calories for each company, making it easy to identify items that exceed recommended sodium levels.
  2. Sodium Exceedance: Many items from KFC and Taco Bell surpass the 2000mg sodium limit, as indicated by points above the red line.
  3. Calorie Correlation: The plot also reveals how calorie content correlates with sodium levels, providing insights into the nutritional profile of fast food offerings.
  4. Company-Specific Trends: Each facet allows for an in-depth view of how different brands compare in terms of high-sodium and high-calorie items.
In [57]:
def create_sodium_scatter():
    # Create scatter plot for sodium levels with facets
    scatter = px.scatter(
        df,
        x="Calories",
        y="Sodium  (mg)",
        color="Company",
        facet_col="Company",
        facet_col_wrap=3,  # 3 plots per row
        title="Scatter Plot: Sodium Levels",
        height=1000,
        width=1200,
        hover_data=["Item"]
    )

    # Add horizontal line for daily recommended sodium intake (2300mg)
    scatter.add_hline(
        y=2000,
        line_dash="dash",
        line_color="red",
        annotation_text="Sodium Limit (2000mg)",
        line_width=1,
        annotation=dict(
            font=dict(color="red", size=10),
            yshift=10
        )
    )

    # Customize layout
    scatter.update_layout(
        showlegend=False,
        title_x=0.5,
        title_font=dict(size=16, color="#2C3E50"),
        yaxis_title="Sodium (mg)",
        xaxis_title="Calories",
        plot_bgcolor='rgba(240, 240, 240, 0.5)',
        height=800
    )

    # Update facet layout
    scatter.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))

    # Update axes ranges and grid
    scatter.update_yaxes(range=[0, 3000], showgrid=True, gridwidth=1, gridcolor='rgba(128, 128, 128, 0.2)')
    scatter.update_xaxes(showgrid=True, gridwidth=1, gridcolor='rgba(128, 128, 128, 0.2)')

    return scatter

# Load and preprocess data
FILE_PATH = 'FastFoodNutritionMenuV2.csv'
df = pd.read_csv(FILE_PATH)
df.columns = [name.replace('\n', " ") for name in df.columns]
df.drop_duplicates(inplace=True)

# Convert sodium values to numeric
df["Sodium  (mg)"] = pd.to_numeric(df["Sodium  (mg)"], errors='coerce')
df["Calories"] = pd.to_numeric(df["Calories"], errors='coerce')

# Sort the data by "Calories" for proper x-axis ordering
df = df.sort_values(by="Calories")

# Create and display the plot
sodium_scatter = create_sodium_scatter()
sodium_scatter.show()

Scatter Plot: Sugar Levels Across Fast Food Companies¶

Code Description¶

This code creates a scatter plot to visualize sugar levels in relation to calorie content for various fast food companies. Each subplot represents a different company, allowing for direct comparison. Key features of the code include:

  • Facet Plotting: Utilizes facet_col to create separate plots for each company, arranged in rows with three plots per row.
  • Sugar Limit Lines: Two horizontal dashed lines indicate the recommended sugar limits—25g for women (red) and 36g for men (orange), providing visual benchmarks.
  • Data Preprocessing: Ensures numeric conversion of sugar and calorie values, and removes duplicates for accurate plotting.
  • Customization: Adjusts layout for readability, including axis titles, background color, and grid lines.

Plot Insights¶

  1. Nutritional Comparison: The scatter plot highlights variations in sugar content relative to calories for each company, making it easy to identify items that exceed recommended sugar limits.
  2. Exceeding Sugar Limits: Many items from KFC and McDonald's surpass the recommended sugar limits, as indicated by points above the dashed lines.
  3. Calorie Correlation: The plot also reveals how calorie content correlates with sugar levels, providing insights into the nutritional profile of fast food offerings.
  4. Company-Specific Trends: Each facet allows for an in-depth view of how different brands compare in terms of high-sugar and high-calorie items.
In [78]:
def create_sugar_scatter():
    # Create scatter plot for sugar levels with facets
    scatter = px.scatter(
        df,
        x="Calories",
        y="Sugars (g)",
        color="Company",
        facet_col="Company",
        facet_col_wrap=3,  # 3 plots per row
        title="Scatter Plot: Sugar Levels",
        height=1000,
        width=1150,
        hover_data=["Item"]
    )

    # Add horizontal line for recommended sugar limit (25g)
    scatter.add_hline(
        y=25,
        line_dash="dash",
        line_color="red",
        annotation_text=" Womens Added Sugar Limit (25g)",
        line_width=1,
        annotation=dict(
            font=dict(color="red", size=10),
            yshift=10,
            xshift=0
        )
    )

    # Add second horizontal line for alternative sugar limit (36g)
    scatter.add_hline(
        y=36,
        line_dash="dash",
        line_color="orange",
        annotation_text="Mens Added Sugar Limit (36g)",
        line_width=1,
        annotation=dict(
            font=dict(color="orange", size=10),
            yshift=-20,
            xshift=0
        )
    )

    # Customize layout
    scatter.update_layout(
        showlegend=False,
        title_x=0.5,
        title_font=dict(size=16, color="#2C3E50"),
        yaxis_title="Sugar (g)",
        xaxis_title="Calories",
        plot_bgcolor='rgba(240, 240, 240, 0.5)',
        height=800
    )

    # Update facet layout
    scatter.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))

    # Update axes ranges and grid
    scatter.update_yaxes(range=[0, 65], showgrid=True, gridwidth=1, gridcolor='rgba(128, 128, 128, 0.2)')
    scatter.update_xaxes(showgrid=True, gridwidth=1, gridcolor='rgba(128, 128, 128, 0.2)')

    return scatter

# Load and preprocess data
FILE_PATH = 'FastFoodNutritionMenuV2.csv'
df = pd.read_csv(FILE_PATH)
df.columns = [name.replace('\n', " ") for name in df.columns]
df.drop_duplicates(inplace=True)

# Convert values to numeric
df["Sugars (g)"] = pd.to_numeric(df["Sugars (g)"], errors='coerce')
df["Calories"] = pd.to_numeric(df["Calories"], errors='coerce')

# Sort the data by "Calories" for proper x-axis ordering
df = df.sort_values(by="Calories")

# Create and display the plot
sugar_scatter = create_sugar_scatter()
sugar_scatter.show()

Interactive Map of 10,000 Fast Food Restaurants in the United States¶

Code Description¶

The provided code uses the Folium library to create an interactive map that visualizes the locations of 10,000 fast food restaurants across the United States. Key features of the code include:

  1. Map Initialization:

    • The map is centered on the geographical center of the U.S. (latitude: 39.8283, longitude: -98.5795) with a default zoom level of 4.
  2. Marker Clusters:

    • FastMarkerCluster: Efficiently renders large datasets by clustering markers dynamically for better performance.
    • MarkerCluster: Adds detailed markers with popups containing restaurant-specific information, such as name, address, city, province, and categories.
  3. Popups and Tooltips:

    • Each marker includes a popup displaying detailed restaurant information and a tooltip showing the restaurant's name for quick identification.
  4. Layer Control:

    • A layer control widget allows users to toggle between different layers (e.g., clusters) for better interaction.
  5. Customization:

    • Markers are styled with red icons and a "cutlery" symbol to represent food-related locations.
  6. Output:

    • The map is saved as an interactive HTML file that can be opened in a web browser.

Insights from the Map¶

  1. Geographic Distribution:

    • The map reveals that fast food restaurants are densely clustered in urban areas and along major highways, reflecting their accessibility and convenience for travelers and city dwellers.
  2. Regional Hotspots:

    • States like California, Texas, and Florida show significant concentrations of fast food chains, highlighting their population density and demand for quick-service dining options.
  3. Category Diversity:

    • The "categories" field in the popup indicates a variety of offerings, from burgers and pizza to specialty cuisines, showcasing the diversity in fast food menus across different regions.
  4. Rural vs. Urban Presence:

    • While urban areas dominate in terms of density, rural regions also have scattered fast food outlets, indicating their importance as essential dining options in less populated areas.
  5. Potential Health Implications:

    • The widespread availability of fast food across the country underscores its role in shaping dietary habits and public health outcomes, particularly in areas with limited access to healthier alternatives.
In [1]:
import folium
import pandas as pd
from folium.plugins import MarkerCluster, FastMarkerCluster

# Load the data
df = pd.read_csv('fastfood.csv')

# Create a map centered on the United States
m = folium.Map(location=[39.8283, -98.5795], zoom_start=4)

# Create a FastMarkerCluster for efficient rendering of many points
fastmarker_cluster = FastMarkerCluster(data=list(zip(df['latitude'], df['longitude'])))
fastmarker_cluster.add_to(m)

# Create a regular MarkerCluster for more detailed information
marker_cluster = MarkerCluster(name="Fast Food Restaurants")

# Add markers for each restaurant
for idx, row in df.iterrows():
    popup_content = f"""
    <b>{row['name']}</b><br>
    Address: {row['address']}<br>
    City: {row['city']}<br>
    Province: {row['province']}<br>
    Categories: {row['categories']}
    """

    folium.Marker(
        location=[row['latitude'], row['longitude']],
        popup=folium.Popup(popup_content, max_width=300),
        tooltip=row['name'],
        icon=folium.Icon(color='red', icon='cutlery', prefix='fa')
    ).add_to(marker_cluster)

marker_cluster.add_to(m)

# Add layer control
folium.LayerControl().add_to(m)

# Save the map
m
Out[1]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Interactive Map: Fast Food Restaurant Locations (Filtered by Six Major Chains)¶

Code Description¶

This code creates an interactive map using the Folium library to visualize the locations of six major fast food chains: McDonald's, Burger King, Wendy's, KFC, Taco Bell, and Pizza Hut. Each chain is represented as a separate layer with distinct marker colors for easy identification. Key features of the code include:

  1. Data Filtering:

    • The dataset is filtered to include only the specified six restaurants.
  2. Color Mapping:

    • Each restaurant is assigned a unique marker color:
      • McDonald's: Red
      • Burger King: Blue
      • Wendy's: Green
      • KFC: Orange
      • Taco Bell: Purple
      • Pizza Hut: Dark Red
  3. Layered Visualization:

    • Separate layers are created for each restaurant, allowing users to toggle visibility using a layer control widget.
  4. Marker Clusters:

    • Markers are clustered for efficient rendering and better performance on maps with a high density of locations.
  5. Popups and Tooltips:

    • Each marker includes a popup displaying detailed restaurant information (name, address, city, province) and a tooltip summarizing the location.
  6. Interactive Features:

    • A layer control widget enables users to toggle between different restaurant layers.

Insights from the Map¶

  1. Geographic Distribution:

    • The map highlights the widespread presence of these six major fast food chains across the United States.
    • Urban areas and major highways show higher densities of fast food outlets, reflecting their strategic placement for accessibility.
  2. Restaurant-Specific Trends:

    • McDonald’s and Burger King dominate in terms of the number of locations, as evident from their dense marker clusters.
    • Pizza Hut has fewer locations compared to other chains, suggesting a more niche presence.
  3. Layered Comparison:

    • The ability to toggle between layers allows for direct comparison of geographic coverage among the chains.
    • This feature is particularly useful for identifying regions where certain chains are underrepresented or overrepresented.
  4. Potential Applications:

    • This map can be used by researchers to study fast food availability in specific regions or by businesses to identify opportunities for expansion.
    • It also provides insights into how fast food accessibility might correlate with dietary habits and public health outcomes in different areas.
In [2]:
import folium
import pandas as pd
from folium.plugins import MarkerCluster

# Load the CSV file
df = pd.read_csv('fastfood.csv')

# Filter the dataset to include only the six specified restaurants
selected_restaurants = ['McDonald\'s', 'Burger King', 'Wendy\'s', 'KFC', 'Taco Bell', 'Pizza Hut']
filtered_df = df[df['name'].isin(selected_restaurants)]

# Define a color mapping for each restaurant
color_mapping = {
    "McDonald's": 'red',
    "Burger King": 'blue',
    "Wendy's": 'green',
    "KFC": 'orange',
    "Taco Bell": 'purple',
    "Pizza Hut": 'darkred'
}

# Create a base map centered on the United States
m = folium.Map(location=[39.8283, -98.5795], zoom_start=4)

# Create a dictionary to store separate layers for each restaurant
layers = {}

for restaurant in selected_restaurants:
    # Filter data for the current restaurant
    restaurant_data = filtered_df[filtered_df['name'] == restaurant]

    # Create a feature group for the restaurant
    layer = folium.FeatureGroup(name=restaurant)
    marker_cluster = MarkerCluster().add_to(layer)

    # Add markers for each location of the current restaurant
    for _, row in restaurant_data.iterrows():
        popup_content = f"""
        <b>{row['name']}</b><br>
        Address: {row['address']}<br>
        City: {row['city']}<br>
        Province: {row['province']}<br>
        """
        tooltip_text = f"{row['name']} ({row['city']}, {row['province']})"

        folium.Marker(
            location=[row['latitude'], row['longitude']],
            popup=folium.Popup(popup_content, max_width=300),
            tooltip=tooltip_text,
            icon=folium.Icon(color=color_mapping[restaurant], icon='cutlery', prefix='fa')
        ).add_to(marker_cluster)

    # Add the layer to the map and dictionary
    layers[restaurant] = layer
    layer.add_to(m)

# Add layer control to toggle between restaurants
folium.LayerControl(collapsed=False).add_to(m)

# Save the map as an HTML file
m
Out[2]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Converting all of the above visualizations to .html file for better visibility¶

In [80]:
import plotly.io as pio
import pandas as pd
import plotly.express as px

# Load the dataset
df = pd.read_csv('FastFoodNutritionMenuV2.csv')  # Replace with your dataset path

# Function to save individual visualizations
def save_visualization(fig, filename):
    pio.write_html(fig, file=filename, auto_open=False)

# Create and save histogram with KDE
hist = px.histogram(df, x='Company', text_auto=True,
                    title="Company Frequency Distribution (Histogram with KDE)", color="Company")

hist.update_layout(
    xaxis_title="Companies",
    yaxis_title="Frequency Count",
    font=dict(size=12, color="black"),
    showlegend=False  # Remove legend
)
save_visualization(hist, 'histogram.html')

# Create and save pie chart
company_value_counts = df['Company'].value_counts()
pie_chart = px.pie(company_value_counts,
                   names=company_value_counts.index,
                   values=company_value_counts.values,
                   hole=0.4,
                   height=600,
                   title="Company Frequency Distribution (Pie Chart)",
                   labels={'index': 'Companies', 'value': 'Frequency Count'})

pie_chart.update_traces(
    textinfo='percent+label',
    hoverinfo='label+percent+value',
    textfont=dict(size=12),
)
pie_chart.update_layout(showlegend=False)  # Remove legend
save_visualization(pie_chart, 'pie_chart.html')

# Create and save additional visualizations
def create_histogram_and_pie(feature_name):
    # Histogram
    hist = px.histogram(df, x="Company", y=feature_name,
                        title=f"Distribution of {feature_name} by Company",
                        text_auto=True, nbins=50, color="Company", height=600)

    hist.update_layout(
        xaxis_title="Company",
        yaxis_title=feature_name,
        showlegend=False  # Remove legend
    )
    hist.update_traces(marker=dict(line=dict(color='white', width=0.5)))
    save_visualization(hist, f'histogram_{feature_name}.html')

    # Pie chart
    pie_chart = px.pie(df, names="Company", values=feature_name,
                       hole=0.4, title=f"Contribution of Each Company to {feature_name}",
                       labels={'Company': 'Companies',
                               feature_name: 'Total Calories'},
                       )

    pie_chart.update_traces(textinfo='percent+label', textfont_size=12)
    pie_chart.update_layout(showlegend=False)  # Remove legend
    save_visualization(pie_chart, f'pie_chart_{feature_name}.html')

create_histogram_and_pie("Calories")

# Combine all saved HTML files into one
with open("combined_visualizations.html", "w") as combined_file:
    for html_file in ['histogram.html', 'pie_chart.html', 'histogram_Calories.html', 'pie_chart_Calories.html']:
        with open(html_file, "r") as f:
            combined_file.write(f.read())
In [81]:
import plotly.io as pio
import pandas as pd
import plotly.express as px

# Load and preprocess data
FILE_PATH = 'FastFoodNutritionMenuV2.csv'
df = pd.read_csv(FILE_PATH)
df.columns = [name.replace('\n', " ") for name in df.columns]
df.drop_duplicates(inplace=True)

# Convert values to numeric
df["Sodium  (mg)"] = pd.to_numeric(df["Sodium  (mg)"], errors='coerce')
df["Calories"] = pd.to_numeric(df["Calories"], errors='coerce')
df["Sugars (g)"] = pd.to_numeric(df["Sugars (g)"], errors='coerce')

# Sort the data by "Calories" for proper x-axis ordering
df = df.sort_values(by="Calories")

# Function to save individual visualizations
def save_visualization(fig, filename):
    pio.write_html(fig, file=filename, auto_open=False)

# Sodium scatter plot
def create_sodium_scatter():
    scatter = px.scatter(
        df,
        x="Calories",
        y="Sodium  (mg)",
        color="Company",
        facet_col="Company",
        facet_col_wrap=3,
        title="Scatter Plot: Sodium Levels",
        height=1000,
        width=1200,
        hover_data=["Item"]
    )
    scatter.add_hline(
        y=2000,
        line_dash="dash",
        line_color="red",
        annotation_text="Sodium Limit (2000mg)",
        line_width=1,
        annotation=dict(font=dict(color="red", size=10), yshift=10)
    )
    scatter.update_layout(
        showlegend=False,
        title_x=0.5,
        title_font=dict(size=16, color="#2C3E50"),
        yaxis_title="Sodium (mg)",
        xaxis_title="Calories",
        plot_bgcolor='rgba(240, 240, 240, 0.5)',
        height=800
    )
    scatter.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
    scatter.update_yaxes(range=[0, 3000], showgrid=True, gridwidth=1, gridcolor='rgba(128, 128, 128, 0.2)')
    scatter.update_xaxes(showgrid=True, gridwidth=1, gridcolor='rgba(128, 128, 128, 0.2)')
    return scatter

sodium_scatter = create_sodium_scatter()
save_visualization(sodium_scatter, 'sodium_scatter.html')

# Sugar scatter plot
def create_sugar_scatter():
    scatter = px.scatter(
        df,
        x="Calories",
        y="Sugars (g)",
        color="Company",
        facet_col="Company",
        facet_col_wrap=3,
        title="Scatter Plot: Sugar Levels",
        height=1000,
        width=1150,
        hover_data=["Item"]
    )
    scatter.add_hline(
        y=25,
        line_dash="dash",
        line_color="red",
        annotation_text="Women's Added Sugar Limit (25g)",
        line_width=1,
        annotation=dict(font=dict(color="red", size=10), yshift=10)
    )
    scatter.add_hline(
        y=36,
        line_dash="dash",
        line_color="orange",
        annotation_text="Men's Added Sugar Limit (36g)",
        line_width=1,
        annotation=dict(font=dict(color="orange", size=10), yshift=-20)
    )
    scatter.update_layout(
        showlegend=False,
        title_x=0.5,
        title_font=dict(size=16, color="#2C3E50"),
        yaxis_title="Sugar (g)",
        xaxis_title="Calories",
        plot_bgcolor='rgba(240, 240, 240, 0.5)',
        height=800
    )
    scatter.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
    scatter.update_yaxes(range=[0, 65], showgrid=True, gridwidth=1, gridcolor='rgba(128, 128, 128, 0.2)')
    scatter.update_xaxes(showgrid=True, gridwidth=1, gridcolor='rgba(128, 128, 128, 0.2)')
    return scatter

sugar_scatter = create_sugar_scatter()
save_visualization(sugar_scatter, 'sugar_scatter.html')

# Combine all saved HTML files into one
with open("combined_visualizations_2.html", "w") as combined_file:
    for html_file in ['sodium_scatter.html', 'sugar_scatter.html']:
        with open(html_file, "r") as f:
            combined_file.write(f.read())